CUDA C++ Debugging: Safer GPU Kernel Programming (Generative AI Programming in C++) by Spuler David
Author:Spuler, David
Language: eng
Format: epub
Publisher: Aussie AI Labs
Published: 2024-10-15T00:00:00+00:00
Letâs examine these issues in turn.
Threads-per-Block Multiple of 32
The number of threads per block (aka the âblock sizeâ) should be a multiple of the warp size, which is 32 threads. Hence, it can be as low as 32, but commonly recommended block sizes in real-world kernels are often 256 or 512. The maximum permitted by CUDA is 1024 threads per block.
This is not good:
float v[54];
...
int blocks = 2;
mykernel<<<blocks, 27>>>(v,n); // BAD
This might actually work, but itâs very inefficient, and offends the sensibilities of any experienced CUDA programmer, for reasons discussed below.
But first, note that this is worse:
float v[54];
...
int blocks = 54;
mykernel<<<blocks, 1>>>(v,n); // BAD
If the threads per block is not 32, or a multiple of 32, there will be odd threads in a warp that arenât properly utilized (or might be doing the wrong thing). The reason is that CUDA allows threads in âwarpsâ that contain exactly 32 threads. With the threads per block of 27, there were 5 extra threads, and for 1 thread per block, there were 31 wasted threads. So, instead, you want something more like this:
float v[64];
int n = 64;
...
int blocks = 2;
mykernel<<<blocks,32>>>(v,n); // BETTER
Or you can do this:
float v[64];
int n = 64;
...
int blocks = 1;
mykernel<<<blocks,64>>>(v,n); // BETTER
CUDA can only schedule 32 threads (a warp) at a time, and if you try to schedule less than that, the rest of the threads in that warp still run, which is a bug or a slug, or are unavailable to run, which is a slug.
Too Few Blocks
You donât always know the incoming size N of your vector data structure (well, actually, you often do in AI engines, because they have static dimensions, but anyway). Letâs try to generalize our computation of how many blocks with each having a fixed number of threads. Thereâs a few basic points:
Each block has the same number of threads (i.e., the threads-per-block)
You canât run half a block (all its threads will run, even if you donât need that many).
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Personalized inhaled bacteriophage therapy for treatment of multidrug-resistant Pseudomonas aeruginosa in cystic fibrosis by unknow(157790)
Whisky: Malt Whiskies of Scotland (Collins Little Books) by dominic roskrow(74282)
CONSORT 2025 statement: updated guideline for reporting randomized trials by unknow(66084)
Critical evaluation of the ProfiLER-02 study design and outcomes by Vivek Subbiah & Razelle Kurzrock(65834)
Cardiac gene therapy makes a comeback by Oliver J. Müller & Susanne Hille & Anca Kliesow Remes(65272)
Unveiling the design rules for tunable emission in graphene quantum dots: A high-throughput TDDFT and machine learning perspective by Şener Özönder & Mustafa Coşkun Özdemir & Caner Ünlü(50860)
A yeast-based oral therapeutic delivers immune checkpoint inhibitors to reduce intestinal tumor burden by unknow(40226)
Covalent hitchhikers guide proteins to the nucleus by Alexander F. Russell & Madeline F. Currie & Champak Chatterjee(40194)
Meet the Authors: Christopher R. Mansfield and Emily R. Derbyshire by Christopher R. Mansfield & Emily R. Derbyshire(40058)
What's Done in Darkness by Kayla Perrin(27111)
Topological analysis of non-conjugated ethylene oxide cored dendrimers decorated with tetraphenylethylene: Insights from degree-based descriptors using the polynomial approach by A Theertha Nair & D Antony Xavier & Annmaria Baby & S Akhila(26485)
Investigation of mechanical and self-healing properties of hydroxyl-terminated polybutadiene functionalized with 2-ureido-4-pyrimidinone by Mohsen Kazazi & Mehran Hayaty & Ali Mousaviazar(26436)
The Ultimate Python Exercise Book: 700 Practical Exercises for Beginners with Quiz Questions by Copy(21023)
De Souza H. Master the Age of Artificial Intelligences. The Basic Guide...2024 by Unknown(20781)
D:\Jan\FTP\HOL\Work\Alien Breed - Tower Assault CD32 Alien Breed II - The Horror Continues Manual 1.jpg by PDFCreator(20651)
The Fifty Shades Trilogy & Grey by E L James(19608)
Shot Through the Heart: DI Grace Fisher 2 by Isabelle Grey(19488)
Shot Through the Heart by Mercy Celeste(19351)
Python GUI Applications using PyQt5 : The hands-on guide to build apps with Python by Verdugo Leire(17495)